NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A supervised Bayesian factor model for the identification of multi-omics signatures

https://doi.org/10.1093/bioinformatics/btae202

Gygi, Jeremy P.; Konstorum, Anna; Pawar, Shrikant; Aron, Edel; Kleinstein, Steven H.; Guan, Leying; Kendziorski, ed., Christina (April 2024, Bioinformatics)

Abstract MotivationPredictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are identified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive modeling. However, multi-omics integration and predictive modeling are generally performed independently in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful. ResultsWe developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the reconstruction of underlying factors in synthetic examples and prediction accuracy of coronavirus disease 2019 severity and breast cancer tumor subtypes. Availability and implementationSPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.
more » « less
Predictive overfitting in immunological applications: Pitfalls and solutions

https://doi.org/10.1080/21645515.2023.2251830

Gygi, Jeremy P.; Kleinstein, Steven H.; Guan, Leying (August 2023, Human Vaccines & Immunotherapeutics)

Overfitting describes the phenomenon where a highly predictive model on the training data generalizes poorly to future observations. It is a common concern when applying machine learning techniques to contemporary medical applications, such as predicting vaccination response and dis-ease status in infectious disease or cancer studies. This review examines the causes of overfitting and offers strategies to counteract it, focusing on model complexity reduction, reliable model evaluation, and harnessing data diversity. Through discussion of the underlying mathematical models and illustrative examples using both synthetic data and published real datasets, our objective is to equip analysts and bioinformaticians with the knowledge and tools necessary to detect and mitigate overfitting in their research.
more » « less
Full Text Available
Integrated longitudinal multiomics study identifies immune programs associated with acute COVID-19 severity and mortality

https://doi.org/10.1172/JCI176640

Gygi, Jeremy P; Maguire, Cole; Patel, Ravi K; Shinde, Pramod; Konstorum, Anna; Shannon, Casey P; Xu, Leqi; Hoch, Annmarie; Jayavelu, Naresh Doni; Haddad, Elias K; et al (May 2024, Journal of Clinical Investigation)

Full Text Available

Search for: All records